wo 2004/013777 



PCTAJS2003/025271 



IT. 




SYSTEM AND METHOD OF PARALLEL PATTERN MATCHINQ 



This application claims the benefit of U.S. provisional application number 60/401457 
filed on August 05, 2002 incoxporated herein by reference in its entirety. 

Field of The Invention 

The field of the Invention is electronic file searching. 

Background of The Invention 

Pattern matching may be defined as an activity which involves searchmg or scanning any 
type of data which can be stored or transmitted in digital format. A common type of pattern 
matching is searching for text in a file. Construction of a "machine" for pattern matching can be 
relatively intuitive. For example, given the pattern, "abc", we would look at every character in 
the file, initially expecting an *a'. If found, we would then examine the next character, expecting 
a 'b*. If a 'b' was found, we would then expect a 'c\ If, at any point, we do not find what we 
expect, we return to the expectation of an 'a'. 

We have just begun to describe a finite state automaton - which generally comprises the 
following five components: 

1. a finite alphabet (e.g. the ascii characters); 

2. a finite set of patterns (e.g. "abc", ...); 

3. . a finite set of states (e.g. one for each of *a% *b', and 'c'.); For each pattern, we 
may also define a final ("accepting") state, which we enter upon having matched that 
pattern (e.g. "abc"); 

4. one designated initial state; and 

5. a move function that defines how the automaton changes state as it processes 
an input stream ( described above.) 
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Parallelism In Pattern Matching 

The notion of parallelism in pattern matching has to do with subpatterns, in particular, 
subpattems of the type in which one or more consecutive elements, starting with the first 
element, occur (in sequence) in a second pattern. There are typically two ways in which this can 
manifest: 

Case 1. The first N elements of Pattern 1 are also the first N elements of pattem2, 
(N >= 1). For example, "air*', "airplane". 

Case 2, The subpattem consisting of the first N elements of patteml appears in 
pattern!, but does not include the first element of pattem2. 
For example, "eeP* and "feeler". 

Finite state automata (fsa or state machines) are typically represented as directed graphs 
(also called state transition diagrams). This type of diagram preferably has a root node, which 
represents the initial state, and edges (or transitions) connecting the nodes, and labeled with the 
input which will trigger each transition. 

An existing pattern matching algorithm is that developed by Aho & Corasik and later 
improved upon by Conmientz-Walter, The Commentz-Walter method is commonly known as 
fgrep. Fgrep uses hashing to skip over areas in the text where no matches are possible. All 
commonly implemented methods of pattern matching use either the original Aho & Corasik 
implementation of the finite state automaton or the fgrep method of partial FSA implementation. 

There is a need, however, to simplify the FSA, making it so fast that it is as good as 
hashing or other skipping methods in the regions without matches, yet faster than Aho & Corasik 
where matches are found. 
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Summary of the Invention 

The present invention provides systems and methods for creating a finite state automata 
(FSA) that matches patterns in parallel including the steps of creating states of the automata firom 
a set of patterns to be matched and passing over the patterns a second time adding transitions to 
the states to match all the possible patterns that can start within the pattern. 

Another aspect is directed toward a FSA that uses array-based transitions. The system 
includes an alphabet pf size N in which each state is represented by an object containing an array 
of N pomters to possible successive states and wherein the numeric value of each member of the 
alphabet is then used as an ofifeet mto the array to point to the next state for the input. 

Yet a further aspect is directed toward creating a case-insensitive FSA by making each 
pattern all one case and after having created the FSA, adding corresponding transitions on each 
alphabetic character so that the case of characters in the input stream will have no effect on the 
performance of the FSA. 

Various objects, features, aspects and advantages of the present invention will become 
more apparent from the following detailed description of preferred embodiments of the 
invention, along with the accompanymg drawings in which like numerals represent like 
components. 

Brief Description of The Drawings 

Fig. 1 is a FSA shown as a directed graph. 

Fig. 2 is partially built FSA. 

Fig. 3 is diagram demonstrating steps of the FSA. 

Fig. 4 is a state transition diagram for a completed FSA. 

Detailed Description 

Referring first to Fig. 1, a state transition diagram 100 for a partially constructed FSA for 
matching "free" and '^eeF' is generally comprised of circles 110-180 which represent the states of 
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the machine, with the double circles (150 and 180) representing final (or accepting) states. 
Implicit in these diagrams is that, for any state, for any input other than those shown, there is a 
transition to the initial state. 

The arrows between the circles represent the transitions or edges, and, in this document 
the numbers in the circles simply represent the order of creation. Since a state can have at most 
one transition for a given input, the set of all possible transitions of a particular FSA can be 
described as a set of ordered pairs of the typQ{input, state}. 
Definitions 

For the remainder of this document, we will use the following conventions.' 
a is equivalent to "there exists". 
I is equivalent to "such that". 
3 is equivalent to "element oF'. 

The ' ♦ operator is used to indicate an attribute (datk or function) of an object 

a, means "the ith element of A", e. g. "3 t(ai.s') | t(ats')3S.T" means, "there exists a t(ai, 

s) such that t (ai, s) is a member of s.T". 

<= is the assignment operator, meaning "becomes". 

= is the equivalence operator. 

null represents the empty state (or "no state"). 

Ellipsis "..." indicate zero or more unspecified parameters. 

Exemplary pseudocode is one-based, i.e. "for i <^ 1 until length (p)" means for each i 
starting with the first element up to the length of p. (This convention differs from languages such 
as C++, C, and Java, which are all zero based). 
F denotes FSA. 

S denotes the set of all states in machine. 

sO is a special identifier for the initial state of machine, which is equivalent to So. 
S denotes the alphabet i.e. the finite set of all possible mputs to machine. 
We define a transition t (a, s), as an ordered pair | a 3 S and s 3 S. 
Each sate s of S has a (possibly empty) set of transitions, denoted by "s.T" . 

Note that the action of any machme, upon matching a pattern will be application-specific. 
e.g. A text search engine might sunply list the number of occurrences of each pattern. A virus 
detection engine might create another thread or process to quarantine or remove the file being 
searched, etc. We will, therefore only refer to two unspecified fimctions, setActionQ and 
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doActionO, which are simply placeholders for specific functions which define and invoke the 
action to be taken. 

The moveO function may be defined as a member of the class state. For any input a in 
A, if s is the current state, we call s,move(a) to determine the next state: 

function : move (aO ; r ; 

-begin ■ 

: if3t(ai,sO :| t(ai,s^)3.T 

returns-; ' • • ., '■• 

■■ .^'se *■-.•• .... ■ ' . ; '-^^ . 
return null; ; r - 

:.' vVendify ' y . ; - ^V--.^ •; • ' ^j.;.;^ 

Array-based Implementation 

For alphabets of up to 256 elements, we may implement each state's set of transitions as 
an array (a fixed block of contiguous memoiy) of length the size of the alphabet. (128 for ascii, 
256 for binary searches). This array may contain pointers to states, and initially will typically 
contain all zeros (mill pointers) if we are building a non-deterministic FSA. (When building a 
deterministic FSA, we will create the initial state first, and initialize its array, and the arrays of all 
subsequently created states to the address of the initial state). This strategy allows us the fastest 
possible state lookup, simply using the numeric value of the input character as an offset into the 
current state's array to determine the next state. 

The following two examples depict the implementation of our array based approach to 
transitions: 

function jhove (^i) 

return Array [ail; 

■ end- ■• 



7 bit (ascii) move 0 8 - (binary) move 0 

We define a second member fiinction of state, addTransitionQ, as follows: Using our 
system, one has the option of creating a non-deterministic FSA and then converting it to a 



function move (aO 
begin ^ • . ' 

return Array [ai& 127]; 

end • - 
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deterministic FSA, if so desired, or simply building a deterministic FSA from the beginning. In 
the following pseudocode; there will be minor differences, depending upon which type of FSA 
we are building. We use the foUowmg conventions to indicate which type we are creating: 

Text in italics is specific to non-deterministic FSA only. 

rfa/tor/iitgrf text indicates Dseudocnd^ .specific io deierministic FSA only. 



Where there is a pair of lines of the above format, one would be use d, depending upon the type of 
FSA- 



functpn addTransitioit (eh ^2) 
.begin ■ ■ .'j; 

if Array Til = sO 

■find;:-' 'vV';''""- 



Note: addTransitionO ensures that there can be only one transition from state s on input a. 
Once entered that transition will not change. 

ronstracting the Machine 

1. Creatine the Graph 

We may now define the following fimctions for building our FSA from a set of patterns, 
P. OeneraUy, we fust defme a flmction. CreateOraph, which, for each pattern, P, in our set of 
patterns, calls the following fimction, createGraph (P). 
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function createGfraph (p) 
begin / : ■ . ; 



ifsq==null 




sO<=:newstate;' . . 
"endif . ' 
. state currentState <= s()!;' . . 
state nextState;<= • niill; J; . ' 
fpr I <=; 1 unWUerigti^^^ 





if nextState = sO ' 
if nextState '^.null^-i - . -v- ^ /■ ■'• 
nextState<i=:iieW'S^ . 
ciirrentS|at^^^ 



2, Completins the graph 

If we consider the two necessary and sufficient conditions for a pattern matching • 
FSA, we will find that having called CreateGraph, we have created an FSA which will satisfy 
ease 1 , above. We will see that we have already created enough states to satisfy both easel and 
case 2, above. All that remains is to add any missing transitions. That is to say, whenever a 
pattern (or any first portion of a pattern) appears as a subpattem of another, we add the 
appropriate transitions to the states that match the containing pattern so that the subpattem will 
not be missed. Since the patterns to be matched, in combination with the transitions of the initial 
state, typically contain all tiie information needed to determine any necessary additional 
transitions, the most direct approach to completing graph is to pass each pattern through our 
partially constructed machine as follows: 

We define a second function, CompleteCrraphQ, which, in turn calls completeGr^h (P) 
for each of patterns, hi completeGraph, we make a second pass over P, starting with its 
transition out of the initial state to the next state, which expects p2, the second element of P. We 
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then move to the second state, as dictated by p2. At this point we check for a transition out of 
state 0 on elencient p^. If found, we add all transitions in that state to our current state, and 
enqueue that state to be examined in the next iteration. We also check each of the previous states 
in our queue, if any, to see if there is amove from that state onpa. If so, we add the edges from 
the state moved to, and enqueue that state. We repeat this process untU reaching the end of our 
pattern. 



function completeGrapii^^^^^^ ' \ ; ; . " 

'begiii . ' ■ ;! : • ..-v^ V- 

queuep'arallelMatc.h^^^^^ . . " 

• state current!Stete<= sO,m ' >^ 

. ■ statetemp'<^ / r'-^ ./ ';^ . '^'^ * / 

: : currentStote <^^ - : ' 

: ; . int qli¥n;«^ : ? v 

P^^ 

■ ' % - curi^ntSta^^^ (tn?p)j^ V . ! - V ^ 

• r^.V '""'enaif?.'.; " ."^r'W \ 

: ' " eiidfor ; ■ ' , . * " ■ / ' ' ; 

tenip s0.inpve(pd";'"" " " • . 

'[\ .'. if temp ^ null ^v-: 

iftemp¥sO; '. • .'^ 

parallelMatchesjiise^^^ 

' . -endif [ • . • •■ ' ' . *. " " 
endfor 

end -• ■ • ■ ■ ■ . ' •• ' ■ ' • • • •■' 1^ — 
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The following function of state, addEdges is called by completeGraphO. 



function addEdges (state destination) . 
begin 

for i c= 1 until.length (Alphabet) 

. state timp <= source^niqye (ai); 
if imp ^ n ull . • . 

if temp"?fesO V. 

\ addTransition (ai, tnip); 
endif. . ' / \ • ^ 

endfor . , 
"end ;/ - . ...*:'•:■. , 



Note that transitions will be added only if there is a null (non-deterministic FSA) or sO 
(deterministic FSA) transition on the given character. 



Completed Non-Deterministic FSA 

Note that states are generally created in the cireateGraph function, and these states may be 
all that are needed. We have now a fully functional FSA with the minimal number of states and 
transitions. In fact, for a nondeterministic FSA, we may have transitions on only a few of the 128 
(or 256) possible elements of our alphabet. Therefore, we may make multiple (two, to be exact) 
transitions on many inputs. This characteristic is precisely what makes it nondeterministic. A 
non-determmisitic FSA is capable of quite efficiently matching any number of patterns in parallel 
using the function, nfaSearchQ, or may be converted to a deterministic FSA, as will be shown 
later. A separate search function, dfaSearchQ, malces at most one move per input, as will also be 
shown later. 



Case Insensitivitv in Text Searches 

In text searches, it is often desirable to make the search case insensitive. We use the 
following mechanism to attain this end with no loss in efficiency. First, all patterns are 
converted to lower case before being added to the machine. Then, after running createGraphQ 
and completeGraphO on all patterns, we call makeCaselnsensitiveO on machine, which for each 
state, for each transition on the set of characters a-z, adds a similar transition to the 
corresponding upper case character. 
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Running a ISfon-Deterministic Automaton 

By calling nfaSearchQ on an input stream, we can match every occurrence of all patterns 
entered by using the fimctions above, making, at most, 2 transitions on any given input: 



hfaSearch (input stream t) . : .. / .. V 
.begin . ■ ' 

for ic= 1 untii length(t) . 
^ if currentState-^t null • 
. . currentState <=.c^ 

" ■ . ■ if currentState'^null :;" .v'^V--^' . 

. ^ . ; if currentStateasAcceptin^^^^ 
' : "A currentStaf e.dpActioii (.^^^ 

. • ■ . ' - ' ■■ /. '^ndif ' . ■ ^ " -' ' 
\ '': • *- -::else'\> '"-VX: . - ■ 

ciirrenfState sO.moye (tj); : 
* • ""^ ekidif . " / * 

- :"•;'.* . ■ ebe'" " ■= ^ - . /"'-.v. 

/ ' .■ ^■'•'burreiitSta^^ 

\ . endif : - ' V' "■**■'■/.•' -'/^i.^'^; • 



Completing a Deterministic Automaton 

If we have followed the steps above, (following the pseudocode specific to creating a 
deterministic FSA), then all that remains is the following: 



' cpmpleteDfa/Q' \ . Z^; ' ; ■ ' 

for-each state s I s 3 S 
for each a |a a S 
. state tmp <= s.move(a); 

if tmp = sO . 

tinp = s0.move (a); 
if tmp9^s0 

s.addTransition (a^ tmp); 
endif , 

endif 

endfor 

endfor 

end 
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Having built a deterministic FS A, we now have, for each state in our macliine, one 
transition on each member of the alphabet The number of states remains unchanged, but the 
total number of transitions changes to the number of states multiplied by the size of the alphabet 
- 128 for purely ascii searches, 256 for searches on all 8 bit entities. 

Making a non-Detef ministic FSA Deterministic 

To make a machine determmistic, we simply iterate through all states, and for all possible 
inputs for which a state has no transition, if there is a non-null transition from the initial state on 
that input, we add that transition to the cunrent state. If not, we add a transition on that mput to 
the initial state: 



makieiXeterdllnistidQ; - " ' 
begin;, . z';^;;,. ■ . /. ;\ /;•■*" .'v \'y :■^ ' V,-..- 

;; ^'vfpr each' states l;s.3 S'^ -. '"'[^y- ^ . .\'';;'-^;;.v. . .'. • ; 
/.'■."'; ■* . \;:;£br each; a|:ay^ ■'.'''^•l^^ 
. ' v?' • ■ ..,,,'7^tate^i!3top-<=-S^^ yv^- / : 

■ tmp =;SO.nioye:<a);: _ 

■ '•' . ;V. else;' •* :y 

' ; - ^ ? J;.addl5ransi1i6iiX^^^ 

,^;eiidif J:.'' ;■ ••"V'": • ^-t" ^ > 
;■ • ■ . , ^;7'endfor:;v.\;'s""'^ ■'V^y./' 

•end. ' v / / •.. * '•■ ■ Si'r- . 



Having built a deterministic FSA, we now have, for each state in our machine, transitions 
on every member of the alphabet. The number of states remains unchanged, but the total number 
of transitions changes to the number of states multiplied by the size of the alphabet 128 for 
purely ascii searches, 256 for searches on all 8 bit entities. (Jf our alphabet were larger, for 
example 64 kilobytes for 2-byte elements, we would probably use a non-deterniinistic machine 
on current hardware.) 
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We now define a search function for our deterministic FSA. Here is the pseudocode: 



dlaSearch (Input stream : 
begin '•'. '••'* ** • 

for i^ 1 jiiitil leiigth(t> 

; :: cui-rentSi^tei # currentState.^ 
. if currentState^iisA ■ 

curreiitState.doAction;(...) V 
; endif, ." a:-. ^ ' . y 
' ■ endfor.' -"' <V\' y''% < '/.'': ' - '-'^ ' 

end . ./ ' ■• -'S" ; - • ■" ." ■ . >• 



(Note: currentState will be set to sO before the first call to dfaSearchQ.) 

The search function for our determmistic machine will make exactly one move for each 
element in the input stream. 

Any pattern used to buildt a finite automaton usi ng CreateGranh and CompleteGraph will 
be likely be matched if it occurs in an i nput stream. 

Proof by mathematical induction'. 

Giyen a pattern P of length n occuring in input stream I: 

1. powill be matched, since: 

a. If the current state is sO, by createGraph (P), a transition t (po, spo) was 

placed in sO. 

b. If the current state is s 9^ sO, by completeGraph (P), s must either haye a 
transition t (po,spo), in which case it is matched, or not, in which case the 
state will become sO (in a dfa) and the transition from sO will be made. 

2. For any pk I k < n — 1, if is pk is matched, pk+i will be matched, since: 
By createGraph and completeGraph, the current state, Su, reached by 
recognition of pu, must haye a transition, t(pfcfi,Sfcfi) which will match pk+i. 

A specialized approach for smaller alphabets 

For certain applications, the alphabet can be quite small. For example, a DNA molecule 
can be thought of as a string over an alphabet of four characters {A, T, C, G} (nucleotides). For 
RNA the characters are {A,C,G,U}. By masking off the 5 high order bits, we get the foUowmg: 
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(In the syntax of C and C-H-, the "&" operator is used for masking off unwanted bits, in 
that only the bits that are common to the numbers on either side of the & are looked at. The 
binary notation for 7 is 000001 1 1). 



a&0x07»l c&0x07 = 3 g&0x07 = 7 t&0x07 = 4 
A&0x07 = l C&0x07 = 3 G&0x07 = 7 T&0x07 = 4 



(Tor RNA searches): 



u & 0x07 = 5 
U&0x07«5 



If we use only the three low-order bits, we can now reduce the size of each state's Array 
to 8, or 1/16* thp size requked for a full ascii search. We then change our moveQ and 

addTransitionO functions to ignore the 5 high-order bits, and the resulting FSA is much 
more compact, giving the abilit^ to search for many more pattems in parallel with much less 
performance degradation due to memory usage. As a side effect, we also get built-in case 
insensitivily. 



function move (iaii)' : ^- v ; ; ■ 
begih' ■■■ ;/"■ 

return-Array [ai- &• 7J|;;- ."-r ; •■ ? ■• 
end • ■ 

function, addTmiistti^^ , 

-begin-'- • ' '' 

; . . ■irif1ndex^<J=:a-&7; ■ ■'■ ^-.'-'i-.i^;/.. 
if Array f index/ "^nuir 
if Array findexl = sO > 

Array' [indc^x] ^ s2; ; . 
• ' endif " \. ;^V. -^.^ v' ' .: 

end \ '■ ' ■- - • ' ' 



V.Ytendin ^ our specialized approach to full ascii searc hes bv addinff a hash function 
A commonly used method of indexing databases is the hash function. A hash function 
reduces a string of characters to a numeric value. 
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If we use a machine, such as Ae one described above, which examines a subset of low 
orders bits of each input on an alphabet which includes all ascii characters, we will get "felse" 
matches, eg each character in "abcde^ij" has the same 5 low order bits as the corresponding 
elements of "qrstuvwxyz". (The values are 1, 2, 3, 4. 5, 6. 7. 8. 9, 10, respectively). If, however, 
we apply a hashing function to the two strings, we get values 306089158 for "abcdefghij", and 
1270619813 for "qrstuvwxyz". 

Using our hash function, which can be case insensitive if we so deshe, we can now derive 
a numeric value of each of our patterns, and store it in the accepting state for that pattern. When 
our machme has a match based upon the low-order bits, the hash function is then applied to the 
characters in the input stream which caused the "partial" match, to detennine whether it has an 
exact match. 

This solution has the advantage of smaller arrays, requiring substantially less thne to 
create the FSA, the ability to do case-insensitive searches with no performance hit, and very little 
decrease in search speed compared to our version using arrays of 128 or 256 elements. 

Creating a 3-hv-5 Card Prototype U sing Our System 

Let us begin by building a simple pattem-matchmg FSA using 3" by 5" note cards. (Paper 
of any size would do.) 

Our goal is to number cards as they are used to create states (we start with 0), and on the 
cards add whatever transitions are needed. The transitions detennine what the next state should 
be for a given input. If a card represents an accepting state (one which denotes a match) we will 
place an asterisk followed by the pattern matched. 

Creating the Graph 

The first step in creating our machine is to define a set of patterns, which we wish to 
match. We will use the patterns, "free" and "eel", for simplicity. First, we create an mitial state 
(state 0). We add the patterns one by one, adding a transition consisting of the furst character of 
our first pattern followed by the state to which to move. We continue until we have reached the 
end of the pattern, marking the final state as accepting, with no transitions. After entering the 



14 



wo 2004/013777 



PCTAJS2003/025271 



first pattern, we use existing states where appropriate, creating new states only when needed, and 
marking the final state as accepting for each pattern. The cards are numbered for convenience, 
by order of creation. The functionality of the machine does not depend on tiieir numbers, but 
helps us to differentiate them in the diagrams. 

The move function 

Referring to Fig. 2, aFSA machine 2O0 for "free" and «eel" generally comprises states 0- 
7 (210-280) and functions as follows: Start by placmg a coin on state 0 (210). indicating that it is 
the current state. TTien scan a stream of text, and for each character, if there is a transition out of 
state 0 (210) on tiiat character, move coin to the state indicatedl,y that transition. Continue m 
this way until reaching the end of tiie input stream. Whenever tiiere is no transition out of a state 
on a character, we may maketwo moves- first we move the coin to state 0 (210) and. if there is 
a transition out of state 0 (2 10) on that character, move again. 

Clearly, eitiier of patterns will be recognized if tiiey begin witii machine in state 0 (210). 
Consider, however, tiie mput stteam, •'freel or eeel". It contains two instances of «eel», but 
neither will be recogized. because tiie machixae will not be in state 0 (210) when the first 'e> of 
«eel» is encountered. It is now time to apply second method, completing the graph. 

r^»»»plftfitig the graph 

In the following we will refer to the state moved to on tiie first character of a string as the 
"first state" for that string, and the character ^ch caused tiiat move, the "fnrst character". 
Similarly, at any pomt, tiie cunent state is the (n*^ ) state moved to on tiie current (n* ) character. 

For each pattern we wUl typically perform tiie following steps. Start with tiie initial state 
andmovetotiiefirststatefortiiatpattem. Using tiie transition on tiie 2"" character of our 
pattern, move to tiie second state. From this point on, repeat tiie following tiirbugh tiie 

accepting/final state. 

Check tiie initial state to see if there is a transition on tiie current character of pattern. If 
one is found, we place tiie card representing that transition's state next to tiie cunent state card. 
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We then copy all transitions from that state to our current state, excepting any transitions on our 
current character. In addition, if that state is an accepting state, we add that information to our 
current state. 

If we have placed state cards next to our previous state card, check to see if there is a 
transition on our current character out of that state. If there is, place the state card for that 
transition next to our current state card, and copy all transitions from that state to our current 
state, excepting any transition on our current character, copying our accepting state information, 
if any, as well. Move to the next state, using the transition on the next charactra:. 

For our example machine, we apply the technique above as follows: 

For our first pattern, "free", we move according to the transition, T 1, from state 0 to state 
1 . We then move to state 2, on 'r'. Then we check the mitial state to see if there is a transition on 
our current character, ' r' . There is none. We now make the transition on 'e' to state 3. We 
check state 0 for a transition on 'e'. There is one, to state 5. We place the state 5 card next to 
our current state (3) card, and cannot copy the transition "e, 6", since we already have a transition 
on 'e'. We then move to state 4 on 'e', and move on 'e' from the previously placed state 5 card to 
state 6, place the state 6 card next to our current state, copying the 'i' 7 transition to our current 
state.. Looking again at state 0, we find the transition on 'e', copying the transition from the state 
. 5 card. 

We follow the same procedure for our second pattern, "eel". Fig. 3 demonstrates the 
steps we have just described. 

After having repeated the above procedure on the two patterns, "free", and "eel", we now 
have a machine which iiow has 10 transitions or edges, as opposed to the original 7, and is 
capable of matching each of our patterns, no matter where they occur in any input stream, 
including overlapping patterns. Fig. 4 is a state transition diagram for our completed FSA for 
• "free", and "eel". 
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State transition diagram for completed FSA 

Our prototype is, by nature object-oriented, i.e. each state is represented in such a way as 
to encapsulate the data (transitions) which define where to move on a given input. We will now 
describe, using an object -oriented form of pseudocode, how we implement our method to create 
a software version of our machine. 

Thus, specific embodiments and applications of an object approach to parallel pattern 
matching have been disclosed. It should be apparent, however, to those skilled in the art that 
many more modifications besides those already described are possible without departing from the 
inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in 
the spirit of the appended claims. Moreover, in interpreting both the specification and the claims, 
all terms should be interpreted in the broadest possible maimer consistent with the context. In 
particular, the terms "comprises" and "comprising" should be interpreted as referring to 
elements, components, or steps in a non-exclusive manner, indicating that the referenced 
elements, components, or steps may be present, or utilized, or combined with other elements, 
components, or steps that are not expressly referenced. 
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